dgit.raspbian.org Git

Do not build the instruction emulator

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

Remove static solaris support from pygrub

Patch-Name: tools-pygrub-remove-static-solaris-support

Gbp-Pq: Topic misc
Gbp-Pq: Name tools-pygrub-remove-static-solaris-support

Do not ship COPYING into /usr/include

This is not wanted in Debian. COPYING ends up in
/usr/share/doc/xen-*copyright.

Patch-Name: tools-include-no-COPYING.diff

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

config-prefix.diff

Patch-Name: config-prefix.diff

Gbp-Pq: Topic prefix-abiname
Gbp-Pq: Name config-prefix.diff

Display Debian package version in hypervisor log

During hypervisor boot, disable the banner and nicely display the xen
version as well as the Maintainer address from debian/control.

For this to work the DEB_VERSION and DEB_MAINTAINER variables needs to
be set by debian/rules.

Original patch by Bastian Blank <waldi@debian.org>
Modified by
Hans van Kranenburg <hans@knorrie.org>
Maximilian Engelhardt <maxi@daemonizer.de>

Delete configure output

These autogenerated files are not useful in Debian; dh_autoreconf will
regenerate them.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

Delete config.sub and config.guess

dh_autoreconf will provide these back.

If this patch does not apply when rebasing, you can simply delete the
files again.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>

debian/changelog: finish 4.17.0+74-g3eac216e6e-1

Update changelog for new upstream 4.17.0+74-g3eac216e6e

[git-debrebase changelog: new upstream 4.17.0+74-g3eac216e6e]

Update to upstream 4.17.0+74-g3eac216e6e

[git-debrebase anchor: new upstream 4.17.0+74-g3eac216e6e, merge]

libacpi: fix PCI hotplug AML

The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).

The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.

As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:

+ ShiftRight (0x4, \_GPE.PH00, Local1)
+ Return (Local1) /* IN status as the _STA */

Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?

Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*

Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.

Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b190af7d3e90f58da5f58044b8dea7261b8b483d
master date: 2023-03-20 17:12:34 +0100

bunzip: work around gcc13 warning

While provable that length[0] is always initialized (because symCount
cannot be zero), upcoming gcc13 fails to recognize this and warns about
the unconditional use of the value immediately following the loop.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511.

Reported-by: Martin Liška <martin.liska@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 402195e56de0aacf97e05c80ed367d464ca6938b
master date: 2023-03-14 10:45:28 +0100

VT-d: constrain IGD check

Marking a DRHD as controlling an IGD isn't very sensible without
checking that at the very least it's a graphics device that lives at
0000:00:02.0. Re-use the reading of the class-code to control both the
clearing of "gfx_only" and the setting of "igd_drhd_address".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: f8c4317295fa1cde1a81779b7e362651c084efb8
master date: 2023-03-14 10:44:08 +0100

x86/altp2m: help gcc13 to avoid it emitting a warning

Switches of altp2m-s always expect a valid altp2m to be in place (and
indeed altp2m_vcpu_initialise() sets the active one to be at index 0).
The compiler, however, cannot know that, and hence it cannot eliminate
p2m_get_altp2m()'s case of returnin (literal) NULL. If then the compiler
decides to special case that code path in the caller, the dereference in
instances of

atomic_dec(&p2m_get_altp2m(v)->active_vcpus);

can, to the code generator, appear to be NULL dereferences, leading to

In function 'atomic_dec',
inlined from '...' at ...:
./arch/x86/include/asm/atomic.h:182:5: error: array subscript 0 is outside array bounds of 'int[0]' [-Werror=array-bounds=]

Aid the compiler by adding a BUG_ON() checking the return value of the
problematic p2m_get_altp2m(). Since with the use of the local variable
the 2nd p2m_get_altp2m() each will look questionable at the first glance
(Why is the local variable not used here?), open-code the only relevant
piece of p2m_get_altp2m() there.

To avoid repeatedly doing these transformations, and also to limit how
"bad" the open-coding really is, convert the entire operation to an
inline helper, used by all three instances (and accepting the redundant
BUG_ON(idx >= MAX_ALTP2M) in two of the three cases).

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: be62b1fc2aa7375d553603fca07299da765a89fe
master date: 2023-03-13 15:16:21 +0100

core-parking: fix build with gcc12 and NR_CPUS=1

Gcc12 takes issue with core_parking_remove()'s

for ( ; i < cur_idle_nums; ++i )
core_parking_cpunum[i] = core_parking_cpunum[i + 1];

complaining that the right hand side array access is past the bounds of
1. Clearly the compiler can't know that cur_idle_nums can only ever be
zero in this case (as the sole CPU cannot be parked).

Arrange for core_parking.c's contents to not be needed altogether, and
then disable its building when NR_CPUS == 1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 4b0422f70feb4b1cd04598ffde805fc224f3812e
master date: 2023-03-13 15:15:42 +0100

x86/spec-ctrl: Add BHI controls to userspace components

This was an oversight when adding the Xen parts.

Fixes: cea9ae062295 ("x86/spec-ctrl: Enumeration for new Intel BHI controls")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 9276e832aef60437da13d91e66fc259fd94d6f91
master date: 2023-03-13 11:26:26 +0000

tools/xenmon: Fix xenmon.py for with python3.x

Fixes for Py3:
* class Delayed(): file not defined; also an error for pylint -E. Inherit
object instead for Py2 compatibility. Fix DomainInfo() too.
* Inconsistent use of tabs and spaces for indentation (in one block)

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3a59443c1d5ae0677a792c660ccd3796ce036732
master date: 2023-02-06 10:22:12 +0000

tools/python: change 's#' size type for Python >= 3.10

Python < 3.10 by default uses 'int' type for data+size string types
(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
required to define PY_SSIZE_T_CLEAN before including Python.h, and using
Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
supported since Python 2.5.

Adjust bindings accordingly.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 897257ba49d0a6ddcf084960fd792ccce9c40f94
master date: 2023-02-06 08:50:13 +0100

x86/vmx: implement Notify VM Exit

Under certain conditions guests can get the CPU stuck in an unbounded
loop without the possibility of an interrupt window to occur on
instruction boundary.  This was the case with the scenarios described
in XSA-156.

Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
if no interrupt window occurs for a specified amount of time.  Note
that using the Notify VM Exit avoids having to trap #AC and #DB
exceptions, as Xen is guaranteed to get a VM Exit even if the guest
puts the CPU in a loop without an interrupt window, as such disable
the intercepts if the feature is available and enabled.

Setting the notify VM exit window to 0 is safe because there's a
threshold added by the hardware in order to have a sane window value.

Note the handling of EXIT_REASON_NOTIFY in the nested virtualization
case is passed to L0, and hence a nested guest being able to trigger a
notify VM exit with an invalid context would be able to crash the L1
hypervisor (by L0 destroying the domain).  Since we don't expose VM
Notify support to L1 it should already enable the required
protections in order to prevent VM Notify from triggering in the first
place.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
x86/vmx: Partially revert "x86/vmx: implement Notify VM Exit"

The original patch tried to do two things - implement VMNotify, and
re-optimise VT-x to not intercept #DB/#AC by default.

The second part is buggy in multiple ways.  Both GDBSX and Introspection need
to conditionally intercept #DB, which was not accounted for.  Also, #DB
interception has nothing at all to do with cpu_has_monitor_trap_flag.

Revert the second half, leaving #DB/#AC intercepted unilaterally, but with
VMNotify active by default when available.

Fixes: 573279cde1c4 ("x86/vmx: implement Notify VM Exit")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 573279cde1c4e752d4df34bc65ffafa17573148e
master date: 2022-12-19 11:24:14 +0100
master commit: 5f08bc9404c7cfa8131e262c7dbcb4d96c752686
master date: 2023-01-20 19:39:32 +0000

x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI

Introduce a small helper to OR VMX_INTR_SHADOW_NMI in
GUEST_INTERRUPTIBILITY_INFO in order to help dealing with the NMI
unblocked by IRET case. Replace the existing usage in handling
EXIT_REASON_EXCEPTION_NMI and also add such handling to EPT violations
and page-modification log-full events.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: d329b37d12132164c3894d0b6284be72576ef950
master date: 2022-12-19 11:23:34 +0100

x86/vmx: implement VMExit based guest Bus Lock detection

Add support for enabling guest Bus Lock Detection on Intel systems.
Such detection works by triggering a vmexit, which ought to be enough
of a pause to prevent a guest from abusing of the Bus Lock.

Add an extra Xen perf counter to track the number of Bus Locks detected.
This is done because Bus Locks can also be reported by setting the bit
26 in the exit reason field, so also account for those.

Note EXIT_REASON_BUS_LOCK VMExits will always have bit 26 set in
exit_reason, and hence the performance counter doesn't need to be
increased for EXIT_REASON_BUS_LOCK handling.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: f7d07619d2ae0382e2922e287fbfbb27722f3f0b
master date: 2022-12-19 11:22:43 +0100

x86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path

As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
the two hunks visible in the patch, RET's are not safe prior to this point.

CR4_PV32_RESTORE hides a CALL/RET pair in certain configurations (PV32
compiled in, SMEP or SMAP active), and the RET can be attacked with one of
several known speculative issues.

Furthermore, CR4_PV32_RESTORE also hides a reference to the cr4_pv32_mask
global variable, which is not safe when XPTI is active before restoring Xen's
full pagetables.

This crash has gone unnoticed because it is only AMD CPUs which permit the
SYSCALL instruction in compatibility mode, and these are not vulnerable to
Meltdown so don't activate XPTI by default.

This is XSA-429 / CVE-2022-42331

Fixes: 5e7962901131 ("x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point")
Fixes: 5784de3e2067 ("x86: Meltdown band-aid against malicious 64-bit PV guests")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit df5b055b12116d9e63ced59ae5389e69a2a3de48)

x86/HVM: serialize pinned cache attribute list manipulation

While the RCU variants of list insertion and removal allow lockless list
traversal (with RCU just read-locked), insertions and removals still
need serializing amongst themselves. To keep things simple, use the
domain lock for this purpose.

This is CVE-2022-42334 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 829ec245cf66560e3b50d140ccb3168e7fb7c945)

x86/HVM: bound number of pinned cache attribute regions

This is exposed via DMOP, i.e. to potentially not fully privileged
device models. With that we may not permit registration of an (almost)
unbounded amount of such regions.

This is CVE-2022-42333 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a5e768640f786b681063f4e08af45d0c4e91debf)

x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
(cherry picked from commit 91767a71061035ae42be93de495cd976f863a41a)

x86/ucode/AMD: late load the patch on every logical thread

Currently late ucode loading is performed only on the first core of CPU
siblings. But according to the latest recommendation from AMD, late
ucode loading should happen on every logical thread/core on AMD CPUs.

To achieve that, introduce is_cpu_primary() helper which will consider
every logical cpu as "primary" when running on AMD CPUs. Also include
Hygon in the check for future-proofing.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f1315e48a03a42f78f9b03c0a384165baf02acae
master date: 2023-02-28 14:51:28 +0100

libs/guest: Fix leak on realloc failure in backup_ptes()

From `man 2 realloc`:

  If realloc() fails, the original block is left untouched; it is not freed or moved.

Found using GCC -fanalyzer:

  |  184 |         backup->entries = realloc(backup->entries,
  |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  |      |         |               | |
  |      |         |               | (91) when ‘realloc’ fails
  |      |         |               (92) ‘old_ptes.entries’ leaks here; was allocated at (44)
  |      |         (90) ...to here

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 275d13184cfa52ebe4336ed66526ce93716adbe0
master date: 2023-02-27 15:51:23 +0000

libs/guest: Fix resource leaks in xc_core_arch_map_p2m_tree_rw()

Edwin, with the help of GCC's -fanalyzer, identified that p2m_frame_list_list
gets leaked. What fanalyzer can't see is that the live_p2m_frame_list_list
and live_p2m_frame_list foreign mappings are leaked too.

Rework the logic so the out path is executed unconditionally, which cleans up
all the intermediate allocations/mappings appropriately.

Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear p2m table")
Reported-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1868d7f22660c8980bd0a7e53f044467e8b63bb5
master date: 2023-02-27 15:51:23 +0000

tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG variable

Replace PKG_CONFIG variable name with PKG_CONFIG_FILE for the name of
the pkg-config file.
This is preventing a conflict in some build systems where PKG_CONFIG
actually contains the path to the pkg-config executable to use, as the
default assignment in libs.mk is using a weak assignment (?=).

This problem has been found when trying to build the latest version of
Xen tools using buildroot.

Fixes: d400dc5729e4 ("tools: tweak tools/libs/libs.mk for being able to support libxenctrl")
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: b97e2fe7b9e1f4706693552697239ac2b71efee4
master date: 2023-02-24 17:44:29 +0000

xen: Fix Clang -Wunicode diagnostic when building asm-macros

While trying to work around a different Clang-IAS bug (parent changeset), I
stumbled onto:

  In file included from arch/x86/asm-macros.c:3:
  ./arch/x86/include/asm/spec_ctrl_asm.h:144:19: error: \u used with
  no following hex digits; treating as '\' followed by identifier [-Werror,-Wunicode]
  .L\@_fill_rsb_loop\uniq:
                    ^

It turns out that Clang -E is sensitive to the file extension of the source
file it is processing.  Furthermore, C explicitly permits the use of \u
escapes in identifier names, so the diagnostic would be reasonable in
principle if we trying to compile the result.

asm-macros should really have been .S from the outset, as it is ultimately
generating assembly, not C.  Rename it, which causes Clang not to complain.

We need to introduce rules for generating a .i file from .S, and substituting
c_flags for a_flags lets us drop the now-redundant -D__ASSEMBLY__.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 53f0d02040b1df08f0589f162790ca376e1c2040
master date: 2023-02-24 17:44:29 +0000

xen: Work around Clang-IAS macro \@ expansion bug

https://github.com/llvm/llvm-project/issues/60792

It turns out that Clang-IAS does not expand \@ uniquely in a translaition
unit, and the XSA-426 change tickles this bug:

  <instantiation>:4:1: error: invalid symbol redefinition
  .L1_fill_rsb_loop:
  ^
  make[3]: *** [Rules.mk:247: arch/x86/acpi/cpu_idle.o] Error 1

Extend DO_OVERWRITE_RSB with an optional parameter so C callers can mix %= in
too, which Clang does seem to expand properly.

Fixes: 63305e5392ec ("x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a2adacff0b91cc7b977abb209dc419a2ef15963f
master date: 2023-02-24 17:44:29 +0000

x86: perform mem_sharing teardown before paging teardown

An assert failure has been observed in p2m_teardown when performing vm
forking and then destroying the forked VM (p2m-basic.c:173). The assert
checks whether the domain's shared pages counter is 0. According to the
patch that originally added the assert (7bedbbb5c31) the p2m_teardown
should only happen after mem_sharing already relinquished all shared pages.

In this patch we flip the order in which relinquish ops are called to avoid
tripping the assert. Conceptually sharing being torn down makes sense to
happen before paging is torn down.

Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 2869349f0cb3a89dcbf1f1b30371f58df6309312
master date: 2023-02-23 12:35:48 +0100

x86/ucode/AMD: apply the patch early on every logical thread

The original issue has been reported on AMD Bulldozer-based CPUs where
ucode loading loses the LWP feature bit in order to gain the IBPB bit.
LWP disabling is per-SMT/CMT core modification and needs to happen on
each sibling thread despite the shared microcode engine. Otherwise,
logical CPUs will end up with different cpuid capabilities.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
Guests running under Xen happen to be not affected because of levelling
logic for the feature masking/override MSRs which causes the LWP bit to
fall out and hides the issue. The latest recommendation from AMD, after
discussing this bug, is to load ucode on every logical CPU.

In Linux kernel this issue has been addressed by e7ad18d1169c
("x86/microcode/AMD: Apply the patch early on every logical thread").
Follow the same approach in Xen.

Introduce SAME_UCODE match result and use it for early AMD ucode
loading. Take this opportunity and move opt_ucode_allow_same out of
compare_revisions() to the relevant callers and also modify the warning
message based on it. Intel's side of things is modified for consistency
but provides no functional change.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f4ef8a41b80831db2136bdaff9f946a1a4b051e7
master date: 2023-02-21 15:08:05 +0100

build: make FILE symbol paths consistent

The FILE symbols in out-of-tree builds may be either a relative path to
the object dir or an absolute path depending on how the build is
invoked. Fix the paths for C files so that they are consistent with
in-tree builds - the path is relative to the "xen" directory (e.g.
common/irq.c).

This fixes livepatch builds when the original Xen build was out-of-tree
since livepatch-build always does in-tree builds. Note that this doesn't
fix the behaviour for Clang < 6 which always embeds full paths.

Fixes: 7115fa562fe7 ("build: adding out-of-tree support to the xen build")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 5b9bb91abba7c983def3b4bef71ab08ad360a242
master date: 2023-02-15 16:13:49 +0100

credit2: respect credit2_runqueue=all when arranging runqueues

Documentation for credit2_runqueue=all says it should create one queue
for all pCPUs on the host. But since introduction
sched_credit2_max_cpus_runqueue, it actually created separate runqueue
per socket, even if the CPUs count is below
sched_credit2_max_cpus_runqueue.

Adjust the condition to skip syblink check in case of
credit2_runqueue=all.

Fixes: 8e2aa76dc167 ("xen: credit2: limit the max number of CPUs in a runqueue")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1f5747ee929fbbcae58d7234c6c38a77495d0cfe
master date: 2023-02-15 16:12:42 +0100

x86/shskt: Disable CET-SS on parts susceptible to fractured updates

Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack
Token".

Architecturally, an event delivery which starts in CPL<3 and switches shadow
stack will first validate the Supervisor Shadow Stack Token (setting the busy
bit), then pushes CS/LIP/SSP.  One example of this is an NMI interrupting Xen.

Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc
between setting the busy bit and completing the event injection renders the
action non-restartable, because when it comes time to restart, the busy bit is
found to be already set.

This is far more easily encountered under virt, yet it is not the fault of the
hypervisor, nor the fault of the guest kernel.  The fault lies somewhere
between the architectural specification, and the uarch behaviour.

Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor
shadow stacks are safe to use.  Because of how Xen lays out its shadow stacks,
fracturing is not expected to be a problem on native.

Detect this case on boot and default to not using shstk if virtualised.
Specifying `cet=shstk` on the command line will override this heuristic and
enable shadow stacks irrespective.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 01e7477d1b081cff4288ff9f51ec59ee94c03ee0
master date: 2023-02-09 18:26:17 +0000

x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}

We don't actually need ecx yet, but adding it in now will reduce the amount to
which leaf 7 is out of order in a featureset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b4a23bf6293aadecfd03bf9e83974443e2eac9cb
master date: 2023-02-09 18:26:17 +0000

libs/util: Fix parallel build between flex/bison and CC rules

flex/bison generate two targets, and when those targets are
prerequisite of other rules they are considered independently by make.

We can have a situation where the .c file is out-of-date but not the
.h, git checkout for example. In this case, if a rule only have the .h
file as prerequiste, make will procced and start to build the object.
In parallel, another target can have the .c file as prerequisite and
make will find out it need re-generating and do so, changing the .h at
the same time. This parallel task breaks the first one.

To avoid this scenario, we put both the header and the source as
prerequisite for all object even if they only need the header.

Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bf652a50fb3bb3b1b3d93db6fb79bc28f978fe75
master date: 2023-02-09 18:26:17 +0000

debian/changelog: finish 4.17.0+46-gaaf74a532c-1

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

debian/changelog: Remove duplicate 'Note that'

This was already included in the changelog for 4.17.0-1 :(

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

debian/changelog: Fix bug number typo.

00:30 < Maxi[m]> Knorrie: I just noticed, the "(Closes: #102983)" from
   our changelog is missing a 0 at the end.
00:30 -zwiebelbot:#debian-xen- Debian#102983:
   quantlib_0.1.9-1(unstable): please add build-depends -
   https://bugs.debian.org/102983
00:31 < Maxi[m]> The correct bug number is #1029830

Oops. We will have to set it do done manually.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

debian/shuffle-boot-files: fix typo

The tree picture changed, but I didn't correct the names in the text.
:-)

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

Update changelog for new upstream 4.17.0+46-gaaf74a532c

[git-debrebase changelog: new upstream 4.17.0+46-gaaf74a532c]

Update to upstream 4.17.0+46-gaaf74a532c

[git-debrebase anchor: new upstream 4.17.0+46-gaaf74a532c, merge]

d/changelog: finish 4.17.0+24-g2f8851c37f-2

changelog: Prepare for upload to experimental

automation: Remove clang-8 from Debian unstable container

First, apt complain that it isn't the right way to add keys anymore,
but hopefully that's just a warning.

Second, we can't install clang-8:
The following packages have unmet dependencies:
clang-8 : Depends: libstdc++-8-dev but it is not installable
           Depends: libgcc-8-dev but it is not installable
           Depends: libobjc-8-dev but it is not installable
           Recommends: llvm-8-dev but it is not going to be installed
           Recommends: libomp-8-dev but it is not going to be installed
libllvm8 : Depends: libffi7 (>= 3.3~20180313) but it is not installable
E: Unable to correct problems, you have held broken packages.

clang on Debian unstable is now version 14.0.6.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a6b1e2b80fe2053b1c9c9843fb086a668513ea36)

x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions

This is XSA-426 / CVE-2022-27672

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 63305e5392ec2d17b85e7996a97462744425db80)

tools/ocaml/libs: Fix memory/resource leaks with caml_alloc_custom()

All caml_alloc_*() functions can throw exceptions, and longjump out of
context. If this happens, we leak the xch/xce handle.

Reorder the logic to allocate the the Ocaml object first.

Fixes: 8b3c06a3e545 ("tools/ocaml/xenctrl: OCaml 5 support, fix use-after-free")
Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d69ccf52ad467ccc22029172a8e61dc621187889)

tools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released

The intf->{addr,len} references in the xc_map_foreign_range() call are unsafe.
From the manual:

https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

More than what the manual says, the intf pointer is (potentially) invalidated
by caml_enter_blocking_section() if another thread happens to perform garbage
collection at just the right (wrong) moment.

Rewrite the logic. There's no need to stash data in the Ocaml object until
the success path at the very end.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9e7c74e6f9fd2e44df1212643b80af9032b45b07)

tools/ocaml/xc: Fix binding for xc_domain_assign_device()

The patch adding this binding was plain broken, and unreviewed. It modified
the C stub to add a 4th parameter without an equivalent adjustment in the
Ocaml side of the bindings.

In 64bit builds, this causes us to dereference whatever dead value is in %rcx
when trying to interpret the rflags parameter.

This has gone unnoticed because Xapi doesn't use this binding (it has its
own), but unbreak the binding by passing RDM_RELAXED unconditionally for
now (matching the libxl default behaviour).

Fixes: 9b34056cb4 ("tools: extend xc_assign_device() to support rdm reservation policy")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 4250683842104f02996428f93927a035c8e19266)

tools/ocaml/evtchn: Don't reference Custom objects with the GC lock released

The modification to the _H() macro for Ocaml 5 support introduced a subtle
bug.  From the manual:

  https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

Previously, the value was a naked C pointer, so dereferencing it wasn't
"accessing any Ocaml data", but the fix to avoid naked C pointers added a
layer of indirection through an Ocaml Custom object, meaning that the common
pattern of using _H() in a blocking section is unsafe.

In order to fix:

* Drop the _H() macro and replace it with a static inline xce_of_val().
* Opencode the assignment into Data_custom_val() in the two constructors.
* Rename "value xce" parameters to "value xce_val" so we can consistently
   have "xenevtchn_handle *xce" on the stack, and obtain the pointer with the
   GC lock still held.

Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 2636d8ff7a670c4d2485757dbe966e36c259a960)

tools/ocaml/libs: Allocate the correct amount of memory for Abstract_tag

caml_alloc() takes units of Wsize (word size), not bytes. As a consequence,
we're allocating 4 or 8 times too much memory.

Ocaml has a helper, Wsize_bsize(), but it truncates cases which aren't an
exact multiple. Use a BUILD_BUG_ON() to cover the potential for truncation,
as there's no rounding-up form of the helper.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Fixes: d3e649277a13 ("ocaml: add mmap bindings implementation.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 36eb2de31b6ecb8787698fb1a701bd708c8971b2)

tools/ocaml/libs: Don't declare stubs as taking void

There is no such thing as an Ocaml function (C stub or otherwise) taking no
parameters. In the absence of any other parameters, unit is still passed.

This doesn't explode with any ABI we care about, but would malfunction for an
ABI environment such as stdcall.

Fixes: c3afd398ba7f ("ocaml: Add XS bindings.")
Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ff8b560be80b9211c303d74df7e4b3921d2bb8ca)

tools/oxenstored: validate config file before live update

The configuration file can contain typos or various errors that could prevent
live update from succeeding (e.g. a flag only valid on a different version).
Unknown entries in the config file would be ignored on startup normally,
add a strict --config-test that live-update can use to check that the config file
is valid *for the new binary*.

For compatibility with running old code during live update recognize
--live --help as an equivalent to --config-test.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit e6f07052ce4a0f0b7d4dc522d87465efb2d9ee86)

tools/ocaml/xb: Drop Xs_ring.write

This function is unusued (only Xs_ring.write_substring is used), and the
bytes/string conversion here is backwards: the C stub implements the bytes
version and then we use a Bytes.unsafe_of_string to convert a string into
bytes.

However the operation here really is read-only: we read from the string and
write it to the ring, so the C stub should implement the read-only string
version, and if needed we could use Bytes.unsafe_to_string to be able to send
'bytes'. However that is not necessary as the 'bytes' version is dropped above.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 01f139215e678c2dc7d4bb3f9f2777069bb1b091)

tools/ocaml/xb,mmap: Use Data_abstract_val wrapper

This is not strictly necessary since it is essentially a no-op currently: a
cast to void * and value *, even in OCaml 5.0.

However it does make it clearer that what we have here is not a regular OCaml
value, but one allocated with Abstract_tag or Custom_tag, and follows the
example from the manual more closely:
https://v2.ocaml.org/manual/intfc.html#ss:c-outside-head

It also makes it clearer that these modules have been reviewed for
compat with OCaml 5.0.

We cannot use OCaml finalizers here, because we want exact control over when
to unmap these pages from remote domains.

No functional change.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d2ccc637111d6dbcf808aaffeec7a46f0b1e1c81)

tools/ocaml/xenctrl: Use larger chunksize in domain_getinfolist

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall. Nevertheless, getting domain info in
blocks of 1024 is far more efficient than blocks of 2.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the previous change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 95db09b1b154fb72fad861815ceae1f3fa49fc4e)

tools/ocaml/xenctrl: Make domain_getinfolist tail recursive

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall. xenopsd was further observed to be
wasting excessive quantites of time manipulating the list of already-obtained
domains.

Implement a tail recursive `rev_concat` equivalent to `concat |> rev`, and use
it instead of calling `@` multiple times.

An incidental benefit is that the list of domains will now be in domid order,
instead of having pairs of 2 domains changing direction every time.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the subsequent change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit c3b6be714c64aa62b56d0bce96f4b6a10b5c2078)

libxl: fix guest kexec - skip cpuid policy

When a domain performs a kexec (soft reset), libxl__build_pre() is
called with the existing domid. Calling libxl__cpuid_legacy() on the
existing domain fails since the cpuid policy has already been set, and
the guest isn't rebuilt and doesn't kexec.

xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM

During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
issue. Before commit 34990446ca91, the libxl__cpuid_legacy() failure
would have been ignored, so kexec would continue.

Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 1e454c2b5b1172e0fc7457e411ebaba61db8fc87
master date: 2023-01-26 10:58:23 +0100

ns16550: fix an incorrect assignment to uart->io_size

uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
is assigned to it, it should be converted to size in bytes.

Fixes: 17b516196c ("ns16550: add ACPI support for ARM only")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 352c89f72ddb67b8d9d4e492203f8c77f85c8df1
master date: 2023-01-24 16:54:38 +0100

build: fix building flask headers before descending in flask/ss/

Unfortunatly, adding prerequisite to "$(obj)/ss/built_in.o" doesn't
work because we have "$(obj)/%/built_in.o: $(obj)/% ;" in Rules.mk.
So, make is allow to try to build objects in "xsm/flask/ss/" before
generating the headers.

Adding a prerequisite on "$(obj)/ss" instead will fix the issue as
that's the target used to run make in this subdirectory.

Unfortunatly, that target is also used when running `make clean`, so
we want to ignore it in this case. $(MAKECMDGOALS) can't be used in
this case as it is empty, but we can guess which operation is done by
looking at the list of loaded makefiles.

Fixes: 7a3bcd2babcc ("build: build everything from the root dir, use obj=$subdir")
Reported-by: "Daniel P. Smith" <dpsmith@apertussolutions.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: d60324d8af9404014cfcc37bba09e9facfd02fcf
master date: 2023-01-23 15:03:58 +0100

x86/shadow: fix PAE check for top-level table unshadowing

Clearly within the for_each_vcpu() the vCPU of this loop is meant, not
the (loop invariant) one the fault occurred on.

Fixes: 3d5e6a3ff383 ("x86 hvm: implement HVMOP_pagetable_dying")
Fixes: ef3b0d8d2c39 ("x86/shadow: shadow_table[] needs only one entry for PV-only configs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f8fdceefbb1193ec81667eb40b83bc525cb71204
master date: 2023-01-20 09:23:42 +0100

x86/vmx: Support for CPUs without model-specific LBR

Ice Lake (server at least) has both architectural LBR and model-specific LBR.
Sapphire Rapids does not have model-specific LBR at all. I.e. On SPR and
later, model_specific_lbr will always be NULL, so we must make changes to
avoid reliably hitting the domain_crash().

The Arch LBR spec states that CPUs without model-specific LBR implement
MSR_DBG_CTL.LBR by discarding writes and always returning 0.

Do this for any CPU for which we lack model-specific LBR information.

Adjust the now-stale comment, now that the Arch LBR spec has created a way to
signal "no model specific LBR" to guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 3edca52ce736297d7fcf293860cd94ef62638052
master date: 2023-01-12 18:42:00 +0000

x86/vmx: Calculate model-specific LBRs once at start of day

There is no point repeating this calculation at runtime, especially as it is
in the fallback path of the WRSMR/RDMSR handlers.

Move the infrastructure higher in vmx.c to avoid forward declarations,
renaming last_branch_msr_get() to get_model_specific_lbr() to highlight that
these are model-specific only.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: e94af0d58f86c3a914b9cbbf4d9ed3d43b974771
master date: 2023-01-12 18:42:00 +0000

include/compat: produce stubs for headers not otherwise generated

Public headers can include other public headers. Such interdependencies
are retained in their compat counterparts. Since some compat headers are
generated only in certain configurations, the referenced headers still
need to exist. The lack thereof was observed with hvm/hvm_op.h needing
trace.h, where generation of the latter depends on TRACEBUFFER=y. Make
empty stubs in such cases (as generating the extra headers is relatively
slow and hence better to avoid). Changes to .config and incrementally
(re-)building is covered by the respective .*.cmd then no longer
matching the command to be used, resulting in the necessary re-creation
of the (possibly stub) header.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 6bec713f871f21c6254a5783c1e39867ea828256
master date: 2023-01-12 16:17:54 +0100

tools: Fix build with recent QEMU, use "--enable-trace-backends"

The configure option "--enable-trace-backend" isn't accepted anymore
and we should use "--enable-trace-backends" instead which was
introduce in 2014 and allow multiple backends.

"--enable-trace-backends" was introduced by:
    5b808275f3bb ("trace: Multi-backend tracing")
The backward compatible option "--enable-trace-backend" is removed by
    10229ec3b0ff ("configure: remove backwards-compatibility and obsolete options")

As we already use ./configure options that wouldn't be accepted by
older version of QEMU's configure, we will simply use the new spelling
for the option and avoid trying to detect which spelling to use.

We already make use if "--firmwarepath=" which was introduced by
    3d5eecab4a5a ("Add --firmwarepath to configure")
which already include the new spelling for "--enable-trace-backends".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
master commit: e66d450b6e0ffec635639df993ab43ce28b3383f
master date: 2023-01-11 10:45:29 +0100

x86/S3: Restore Xen's MSR_PAT value on S3 resume

There are two paths in the trampoline, and Xen's PAT needs setting up in both,
not just the boot path.

Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4d975798e11579fdf405b348543061129e01b0fb
master date: 2023-01-10 21:21:30 +0000

d/changelog: finish 4.17.0+24-g2f8851c37f-1

ci: Update reason why arm64 crossbuild is disabled

The old reason why it was disabled was bug 982406 'mark markdown
Multi-Arch: foreign', but that was recently fixed.

Trying to enable it revealed another reason why it still doesn't work:
$ eatmydata apt-get build-dep ${HOST_ARCH:+--host-architecture ${HOST_ARCH} -Pcross,nocheck} --no-install-recommends -y $aptopts .
...
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
ocaml:arm64 : Depends: gcc:arm64 but it is not installable
Depends: binutils:arm64 but it is not installable
E: Unable to correct problems, you have held broken packages.

d/control: Drop markdown B-D for documentation

In upstream commit a2783e97fb220347bcf46583867782712a172710 the build
dependency on markdown was dropped and it has not been needed anymore
since Xen 4.12, so drop it in Debian too.

d/rules: use pkg-info.mk and do Maintainer parsing in d/rules

Use DEB_VERSION and DEB_VERSION_UPSTREAM from
/usr/share/dpkg/pkg-info.mk as suggested by lintian. This fixes
'debian-rules-parses-dpkg-parsechangelog' in the lintian output.

Also move parsing of the Maintainer field in debian/control from our
delta queue to debian/rules and use the newly available DEB_VERSION in
the delta queue.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/xen-hypervisor-common.lintian-overrides: ignore false positive

erroneous 'debian-news-entry-has-unknown-version' is emitted by lintian
due to #1021502.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/control: set Rules-Requires-Root: no

As suggested by lintian. There are no differences in the built binaries.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

debian: remove old leftovers from config file handling

ae40dca3211ec35ca235a8a1f34c37e13093ff0d removed the call to the
debian/ucf-remove-fixup script from debian/rules. However the comment
explaining why this call was there was not removed. Additionally the
override_dh_ucf now only calls dh_ucf without doing anything else.

This commit removes the now unused debian/ucf-remove-fixup script, the
leftover comment referring to it and the dh_ucf override which doesn't
do anything but a call of dh_ucf.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/xen-utils-common.xendomains.default: adjust to upstream template

Xen upstream sets XENDOMAINS_MIGRATE to any empty string be default. Do
the same in our template file.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/xen-utils-common.xendomains.default: remove XENDOMAINS_SYSRQ

XENDOMAINS_SYSRQ is currently not supported by our init scripts, so don't
mention it in the default config file.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/shuffle-boot-files: Also handle debug files

See the comment in the change for explanation. We do a fixup for file
names in /boot already, but the files in /usr/lib/debug should get the
same treatment!

Closes: #995233
Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

d/shuffle-boot-files: Add a note about d/not-installed

Add a hint about the fact that this boot/ location is also present in
d/not-installed. This might help someone looking at all of this for the
first time to discover the puzzle pieces that are involved.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

debian: split debug files out of xen-hypervisor-V-F and xen-utils-V

The debug files have grown in size over time and can no longer be
considered small. So we now ship them uncompressed in new -dbg
packages.

The files are installed into /usr/lib/debug at the same path as the
binaries they correspond to, as described in the "Best practices for
debug packages" (Section 6.8.9) in the Debian Developer's Reference.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/control: change Depends from lsb-base to sysvinit-utils

lsb-base is now a transitional package depending on sysvinit-utils.
Thus, depending on lsb-base now gives the following lintian error:
E: xen-utils-common: depends-on-obsolete-package Depends: lsb-base

Keep lsb-base as an optional dependency to allow backporting to
bullseye.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/rules: 'dh_missing --fail-missing' is default in dh compat 13

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

debian: switch to debhelper compat version 13

Thanks to Diederik de Haas for helping with this.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/control: update build dependency to libext2fs-dev

This also works in bullseye, so backporting is easy.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>

d/copyright: rewrite from scratch

The d/copyright file was very old and outdated. Create an up to date one
now, also using the recommended semi-machine-readable format.

The following files in the upstream source tree were used to produce
this information:

  COPYING
  xen/COPYING
  xen/include/public/COPYING
  xen/common/COPYING
  xen/common/README.source
  xen/common/libelf/COPYING
  xen/crypto/README.source
  xen/include/crypto/README.source
  docs/README.source
  m4/README.source
  stubdom/vtpm/COPYING
  stubdom/COPYING
  tools/firmware/vgabios/COPYING
  tools/include/xen.COPYING.in
  tools/libacpi/COPYING
  tools/libs/guest/COPYING
  tools/xenmon/COPYING
  tools/libs/stat/COPYING
  tools/xenstore/COPYING

If license text is not available on a Debian system by default, the
included text was copied from files in the upstream LICENSES/ directory.

The 4.17 stable branch was used for this. When we advance the upstream
code to e.g. 4.18 we can check if there have been changes made to these
files and update the large copyright file.

Signed-off-by: Hans van Kranenburg <hans@knorrie.org>

Update changelog for new upstream 4.17.0+24-g2f8851c37f

[git-debrebase changelog: new upstream 4.17.0+24-g2f8851c37f]

Update to upstream 4.17.0+24-g2f8851c37f

[git-debrebase anchor: new upstream 4.17.0+24-g2f8851c37f, merge]

Revert "tools/xenstore: simplify loop handling connection I/O"

I'm observing guest kexec trigger xenstored to abort on a double free.

gdb output:
Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) at ./nptl/pthread_kill.c:44
44    ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
    at ./nptl/pthread_kill.c:44
    at ./nptl/pthread_kill.c:78
    at ./nptl/pthread_kill.c:89
    at ../sysdeps/posix/raise.c:26
    at talloc.c:119
    ptr=ptr@entry=0x559fae724290) at talloc.c:232
    at xenstored_core.c:2945
(gdb) frame 5
    at talloc.c:119
119            TALLOC_ABORT("Bad talloc magic value - double free");
(gdb) frame 7
    at xenstored_core.c:2945
2945                talloc_increase_ref_count(conn);
(gdb) p conn
$1 = (struct connection *) 0x559fae724290

Looking at a xenstore trace, we have:
IN 0x559fae71f250 20230120 17:40:53 READ (/local/domain/3/image/device-model-dom
id )
wrl: dom    0      1  msec      10000 credit     1000000 reserve        100 disc
ard
wrl: dom    3      1  msec      10000 credit     1000000 reserve        100 disc
ard
wrl: dom    0      0  msec      10000 credit     1000000 reserve          0 disc
ard
wrl: dom    3      0  msec      10000 credit     1000000 reserve          0 disc
ard
OUT 0x559fae71f250 20230120 17:40:53 ERROR (ENOENT )
wrl: dom    0      1  msec      10000 credit     1000000 reserve        100 disc
ard
wrl: dom    3      1  msec      10000 credit     1000000 reserve        100 disc
ard
IN 0x559fae71f250 20230120 17:40:53 RELEASE (3 )
DESTROY watch 0x559fae73f630
DESTROY watch 0x559fae75ddf0
DESTROY watch 0x559fae75ec30
DESTROY watch 0x559fae75ea60
DESTROY watch 0x559fae732c00
DESTROY watch 0x559fae72cea0
DESTROY watch 0x559fae728fc0
DESTROY watch 0x559fae729570
DESTROY connection 0x559fae724290
orphaned node /local/domain/3/device/suspend/event-channel deleted
orphaned node /local/domain/3/device/vbd/51712 deleted
orphaned node /local/domain/3/device/vkbd/0 deleted
orphaned node /local/domain/3/device/vif/0 deleted
orphaned node /local/domain/3/control/shutdown deleted
orphaned node /local/domain/3/control/feature-poweroff deleted
orphaned node /local/domain/3/control/feature-reboot deleted
orphaned node /local/domain/3/control/feature-suspend deleted
orphaned node /local/domain/3/control/feature-s3 deleted
orphaned node /local/domain/3/control/feature-s4 deleted
orphaned node /local/domain/3/control/sysrq deleted
orphaned node /local/domain/3/data deleted
orphaned node /local/domain/3/drivers deleted
orphaned node /local/domain/3/feature deleted
orphaned node /local/domain/3/attr deleted
orphaned node /local/domain/3/error deleted
orphaned node /local/domain/3/console/backend-id deleted

and no further output.

The trace shows that DESTROY was called for connection 0x559fae724290,
but that is the same pointer (conn) main() was looping through from
connections.  So it wasn't actually removed from the connections list?

Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling
connection I/O" fixes the abort/double free.  I think the use of
list_for_each_entry_safe is incorrect.  list_for_each_entry_safe makes
traversal safe for deleting the current iterator, but RELEASE/do_release
will delete some other entry in the connections list.  I think the
observed abort is because list_for_each_entry has next pointing to the
deleted connection, and it is used in the subsequent iteration.

Add a comment explaining the unsuitability of list_for_each_entry_safe.
Also notice that the old code takes a reference on next which would
prevents a use-after-free.

This reverts commit e8e6e42279a5723239c5c40ba4c7f579a979465d.

This is XSA-425/CVE-2022-42330.

Fixes: e8e6e42279a5 ("tools/xenstore: simplify loop handling connection I/O")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

debian/changelog: finish 4.17.0-1

d/control: update Build-Depends for ocaml

ocaml-native-compilers is not shipped in Debian since buster, ocaml-nox
is a transitional package for ocaml in unstable/testing.
Since ocaml depends on ocaml-nox in bullseye, it doesn't affect
backports.

d/control: Update Standards-Version to 4.6.2

no changes needed

Update changelog for new upstream 4.17.0

[git-debrebase changelog: new upstream 4.17.0]

Update to upstream 4.17.0

[git-debrebase anchor: new upstream 4.17.0, merge]

tools/oxenstored: Render backtraces more nicely in Syslog

fallback_exception_handler feeds a string with embedded newlines directly into
syslog(). While this is an improvement on getting nothing, syslogd escapes
all control characters it gets, and emits one (long) log line.

Fix the problem generally in the syslog stub. As we already have a local copy
of the string, split it in place and emit one syslog() call per line.

Also tweak Logging.msg_of to avoid putting an extra newline on a string which
already ends with one.

Fixes: ee7815f49faf ("tools/oxenstored: Set uncaught exception handler")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d2162d884cba0ff7b2ac0d832f4e044444bda2e1)

tools/oxenstored/syslog: Avoid potential NULL dereference

strdup() may return NULL. Check for this before passing to syslog().

Drop const from c_msg. It is bogus, as demonstrated by the need to cast to
void * in order to free the memory.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit acd3fb6d65905f8a185dcb9fe6a330a591b96203)

tools/oxenstored: Set uncaught exception handler

Unhandled exceptions go to stderr by default, but this doesn't typically work
for oxenstored because:
* daemonize reopens stderr as /dev/null
* systemd redirects stderr to /dev/null too

Debugging an unhandled exception requires reproducing the issue locally when
using --no-fork, and is not conducive to figuring out what went wrong on a
remote system.

Install a custom handler which also tries to render the backtrace to the
configured syslog facility, and DAEMON|ERR otherwise.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ee7815f49faf743e960dac9e72809eb66393bc6d)

tools/oxenstored: Log live update issues at warning level

During live update, oxenstored tries a best effort approach to recover as many
domains and information as possible even if it encounters errors restoring
some domains.

However, logging about misunderstood input is more severe than simply info.
Log it at warning instead.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 3f02e0a70fe9f8143454b742563433958d4a87f8)

tools/oxenstored: Keep /dev/xen/evtchn open across live update

Closing the evtchn handle will unbind and free all local ports.  The new
xenstored would need to rebind all evtchns, which is work that we don't want
or need to be doing during the critical handover period.

However, it turns out that the Windows PV drivers also rebind their local port
too across suspend/resume, leaving (o)xenstored with a stale idea of the
remote port to use.  In this case, reusing the established connection is the
only robust option.

Therefore:
* Have oxenstored open /dev/xen/evtchn without CLOEXEC at start of day.
* Extend the handover information with the evtchn fd, domexc virq local port,
   and the local port number for each domain connection.
* Have (the new) oxenstored recover the open handle using Xeneventchn.fdopen,
   and use the provided local ports rather than trying to rebind them.

When this new information isn't present (i.e. live updating from an oxenstored
prior to this change), the best-effort status quo will have to do.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9b224c25293a53fcbe32da68052d861dda71a6f4)

tools/oxenstored: Rework Domain evtchn handling to use port_pair

Inter-domain event channels are always a pair of local and remote ports.
Right now the handling is asymmetric, caused by the fact that the evtchn is
bound after the associated Domain object is constructed.

First, move binding of the event channel into the Domain.make() constructor.
This means the local port no longer needs to be an option.  It also removes
the final callers of Domain.bind_interdomain.

Next, introduce a new port_pair type to encapsulate the fact that these two
should be updated together, and replace the previous port and remote_port
fields.  This refactoring also changes the Domain.get_port interface (removing
an option) so take the opportunity to name it get_local_port instead.

Also, this fixes a use-after-free risk with Domain.close.  Once the evtchn has
been unbound, the same local port number can be reused for a different
purpose, so explicitly invalidate the ports to prevent their accidental misuse
in the future.

This also cleans up some of the debugging, to always print a port pair.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit df2db174b36eba67c218763ef621c67912202fc6)

tools/oxenstored: Implement Domain.rebind_evtchn

Generally speaking, the event channel local/remote port is fixed for the
lifetime of the associated domain object. The exception to this is a
secondary XS_INTRODUCE (defined to re-bind to a new event channel) which pokes
around at the domain object's internal state.

We need to refactor the evtchn handling to support live update, so start by
moving the relevant manipulation into Domain.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit aecdc28d9538ca2a1028ef9bc6550cb171dbbed4)